Analysis of Spatial Data

We are going to analyze a breast cancer section from 10x Genomicsis Visium Gene Expression platform.

For the analysis we will mainly use the R package Seurat.

Download Data

You can download the data from this website or using curl, as shown below. We do not need to do it because it is already in the data folder.

# download files
curl -O https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Breast_Cancer_Block_A_Section_1/V1_Breast_Cancer_Block_A_Section_1_filtered_feature_bc_matrix.h5
curl -O https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Breast_Cancer_Block_A_Section_1/V1_Breast_Cancer_Block_A_Section_1_spatial.tar.gz

Libraries

These are the libraries we are going to use:

# set seed for reproducibility purposes
set.seed(12345)

# create palette for the cell types
cell_type_palette <- as.vector(pals::polychrome())

Load data with Seurat

se_obj <- Load10X_Spatial(
  data.dir = "data"
)

The visium data from 10x consists is stored in a Seurat object as:

  • A spot x gene expression matrix (similar to the cell x gene matrix)
se_obj[["Spatial"]][1:5, 1:5]
## 5 x 5 sparse Matrix of class "dgCMatrix"
##             AAACAAGTATCTCCCA-1 AAACACCAATAACTGC-1 AAACAGAGCGACTCCT-1
## MIR1302-2HG                  .                  .                  .
## FAM138A                      .                  .                  .
## OR4F5                        .                  .                  .
## AL627309.1                   .                  .                  .
## AL627309.3                   .                  .                  .
##             AAACAGGGTCTATATT-1 AAACAGTGTTCCTGGG-1
## MIR1302-2HG                  .                  .
## FAM138A                      .                  .
## OR4F5                        .                  .
## AL627309.1                   .                  .
## AL627309.3                   .                  .
  • Image of the tissue slice (obtained from staining during sample processing in the lab)
SpatialPlot(
  se_obj, 
  pt.size = 0
  ) + 
  NoLegend()

  • Scaling factors that relate the original high resolution image to the lower resolution image used here for visualization.
se_obj@images$slice1@scale.factors
## $spot
## [1] 0.08250825
## 
## $fiducial
## [1] 286.7012
## 
## $hires
## [1] 0.08250825
## 
## $lowres
## [1] 0.02475247
## 
## attr(,"class")
## [1] "scalefactors"

Quality control

As with single-cell objects, we have some important features that we can use to filter out bad quality spots.

  • nCount_Spatial: number of UMIs per spot
plot_1 <- VlnPlot(
  se_obj, 
  features = "nCount_Spatial", 
  pt.size = 0.1
  ) + 
  NoLegend()

plot_2 <- SpatialFeaturePlot(
  se_obj, 
  features = "nCount_Spatial"
  )

plot_1 | plot_2

The variability in the distribution of UMIs is related to the tissue architecture, i.e. tumoral regions are more densely packked and thus contain spots with higher counts.

  • nFeature_Spatial: number of genes per spot
plot_1 <- VlnPlot(
  se_obj, 
  features = "nFeature_Spatial", 
  pt.size = 0.1
  ) + 
  NoLegend()

plot_2 <- SpatialFeaturePlot(
  se_obj, 
  features = "nFeature_Spatial"
  )

plot_1 | plot_2

Here we can also see how the number of genes per spot correlates with the structure of the tissue

  • mt.content and rb.content: mitochondrial and ribosomal content per spot, respectively

We have to compute this two values by calculating the percentage of reads per spot that belong to mitochondrial/ribosomal genes.

# Mitochondrial content
se_obj[["mt.content"]] <- PercentageFeatureSet(
  object = se_obj,
  pattern = "^MT-")

summary(se_obj[["mt.content"]])
##    mt.content    
##  Min.   : 1.118  
##  1st Qu.: 2.761  
##  Median : 3.504  
##  Mean   : 4.009  
##  3rd Qu.: 4.678  
##  Max.   :14.415
# Ral contentibosomal
se_obj[["rb.content"]] <- PercentageFeatureSet(
  object = se_obj,
  pattern = "^RPL|^RPS")

summary(se_obj[["rb.content"]])
##    rb.content    
##  Min.   : 6.697  
##  1st Qu.:10.765  
##  Median :11.752  
##  Mean   :11.958  
##  3rd Qu.:12.923  
##  Max.   :21.482
SpatialFeaturePlot(
  se_obj, 
  features = c("mt.content", "rb.content")
  )

In the case of spatial data, high mitochondrial content is not an indicator of bad quality spots and thus should not be a criterion to filter out spots.

We are going to filter out genes that have no expression in the tissue.

table(rowSums(as.matrix(se_obj@assays$Spatial@counts)) == 0)
## 
## FALSE  TRUE 
## 24923 11678
keep_genes <- rowSums(as.matrix(se_obj@assays$Spatial@counts)) != 0
se_obj <- se_obj[keep_genes, ]

Furthermore, we will filter out spots with very low number of counts (< 500) before proceeding with the downstream analysis.

se_obj <- subset(
  se_obj,
  subset = nCount_Spatial > 500
)

Preprocessing

Similar to single-cell datasets, preprocessing for spatial data requires normalizing, finding variable features and scaling the counts.

se_obj <- NormalizeData(se_obj)
se_obj <- FindVariableFeatures(se_obj)
se_obj <- ScaleData(se_obj)

Prior to clustering, we perform dimensionality reduction.

se_obj <- RunPCA(se_obj)
se_obj <- RunUMAP(
  se_obj,
  reduction = "pca",
  dims = 1:20
)

Clustering and visualization

se_obj <- FindNeighbors(
  se_obj, 
  reduction = "pca", 
  dims = 1:20
  )

se_obj <- FindClusters(
  se_obj,
  resolution = 0.3
  )
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 3798
## Number of edges: 132786
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9245
## Number of communities: 9
## Elapsed time: 0 seconds
plot_1 <- DimPlot(
  se_obj, 
  label = TRUE
  )

plot_2 <- SpatialDimPlot(
  se_obj, 
) +
  NoLegend()

plot_1 + plot_2

Gene expression and annotation

ESR1 and ERBB2 are the most common mutations in breast cancer, so one way of annotating the tissue is by looking at ESR1 and ERBB2 positive/negative regions.

SpatialFeaturePlot(
  se_obj, 
  features = c("ESR1", "ERBB2"), 
  alpha = c(0.1, 1)
  )

Deconvolution

We are going to use SPOTlight in conjunction with a subset of the Tumor Immune Cell Atlas to deconvolute our spots and map immune populations to our tumor section.

# load library
library(SPOTlight)

Load downsampled version of the Tumor Immune Cell Atlas and explore the metadata we have. We are going to use annotation level 1 for the deconvolution (lv1_annot).

atlas <- readRDS("data/TICAtlas_downsample.rds")
head(atlas@meta.data)
##                patient nCount_RNA nFeature_RNA percent.mt  gender subtype
## MEL2_2_W462024  MEL2_2       1888          831   7.097458 unknown      CM
## MEL2_2_W462047  MEL2_2       1556          734  11.053985 unknown      CM
## MEL2_2_W462184  MEL2_2       1895          846  16.569921 unknown      CM
## MEL2_2_W462249  MEL2_2       1672          806   8.193780 unknown      CM
## MEL2_1_W462337  MEL2_1        570          310  10.701754 unknown      CM
## MEL2_2_W462742  MEL2_2       1214          620  12.602965 unknown      CM
##                   source                lv1_annot                    lv2_annot
## MEL2_2_W462024 melanoma2            T cells naive        CD4 memory stem cells
## MEL2_2_W462047 melanoma2            CD8 cytotoxic                CD8 cytotoxic
## MEL2_2_W462184 melanoma2      CD8 effector memory                           NK
## MEL2_2_W462249 melanoma2            T cells naive        CD4 memory stem cells
## MEL2_1_W462337 melanoma2         CD4 naive-memory        CD4 memory stem cells
## MEL2_2_W462742 melanoma2 CD8 terminally exhausted CD4 resident effector memory

First of all we need to compute the markers for the cell types using Seurat’s funtion FindAllMarkers.

Idents(atlas) <- "lv1_annot"
  
sc_markers <- FindAllMarkers(
  object = atlas,
  assay = "RNA",
  slot = "data",
  only.pos = TRUE
  )

sc_markers <- filter(
  sc_markers,
  avg_log2FC >= 0.7
  )

sc_markers <- filter(
  sc_markers,
  str_detect(string = gene,
             pattern = "^Rps|^Rpl|^Mt-",
             negate = TRUE
             )
  )
saveRDS(sc_markers, "R_obj/filtered_atlas_markers.rds")
sc_markers <- readRDS("R_obj/filtered_atlas_markers.rds")
sc_markers %>% 
  group_by(cluster) %>% 
  top_n(5, wt = avg_log2FC)
## # A tibble: 105 x 7
## # Groups:   cluster [21]
##       p_val avg_log2FC pct.1 pct.2 p_val_adj cluster       gene  
##       <dbl>      <dbl> <dbl> <dbl>     <dbl> <fct>         <chr> 
##  1 1.00e-29       1.77  0.5  0.059  1.06e-24 T cells naive FLT3LG
##  2 1.23e-27       2.08  0.62 0.107  1.31e-22 T cells naive TCF7  
##  3 1.72e-21       1.68  0.74 0.196  1.83e-16 T cells naive IL7R  
##  4 1.82e-19       1.63  0.46 0.079  1.93e-14 T cells naive CCR7  
##  5 3.43e-12       1.66  0.64 0.23   3.64e- 7 T cells naive LTB   
##  6 1.14e-20       1.77  0.76 0.202  1.21e-15 CD8 cytotoxic GZMA  
##  7 1.58e-18       2.00  0.98 0.416  1.67e-13 CD8 cytotoxic CCL5  
##  8 2.12e- 8       1.32  0.68 0.331  2.25e- 3 CD8 cytotoxic CD2   
##  9 2.58e- 5       1.39  0.36 0.155  1   e+ 0 CD8 cytotoxic ARPC5L
## 10 2.78e- 4       1.34  0.18 0.055  1   e+ 0 CD8 cytotoxic CSF1  
## # ... with 95 more rows

Run the deconvolution using the scRNAseq atlas and the spatial transcriptomics data.

decon_mtrx_ls <- spotlight_deconvolution(
  se_sc = atlas,
  counts_spatial = se_obj@assays$Spatial@counts,
  clust_vr = "lv1_annot",
  cluster_markers = sc_markers,
  hvg = 0,
  min_cont = 0,
  assay = "RNA",
  slot = "counts")

saveRDS(decon_mtrx_ls, "R_obj/deconvolution_matrix.rds")
decon_mtrx_ls <- readRDS("R_obj/deconvolution_matrix.rds")

Before even looking at the decomposed spots we can gain insight on how well the model performed by looking at the topic profiles for the cell types.

nmf_mod <- decon_mtrx_ls[[1]]
decon_mtrx <- decon_mtrx_ls[[2]]

rownames(decon_mtrx) <- colnames(se_obj)

# info on the NMF model
nmf_mod[[1]]
## <Object of class: NMFfit>
##  # Model:
##   <Object of class:NMFns>
##   features: 2609 
##   basis/rank: 21 
##   samples: 1043 
##   theta: 0.5 
##  # Details:
##   algorithm:  nsNMF 
##   seed:  NMF 
##   RNG: 10403L, 624L, ..., -689249108L [e0d3d05f1b721ce7d02aa5d9b936a60e]
##   distance metric:  'KL' 
##   residuals:  988388.3 
##   Iterations: 2000 
##   Timing:
##      user  system elapsed 
##    741.69    2.12  744.81
# deconvolution matrix
head(decon_mtrx)
##                        B.cells CD4.effector.memory CD4.naive.memory
## AAACAAGTATCTCCCA-1 0.000000000        4.787970e-18                0
## AAACACCAATAACTGC-1 0.109467376        3.278547e-02                0
## AAACAGAGCGACTCCT-1 0.152343572        1.493283e-02                0
## AAACAGGGTCTATATT-1 0.462152569        1.029516e-17                0
## AAACAGTGTTCCTGGG-1 0.000000000        4.114186e-02                0
## AAACATTTCCCGGATT-1 0.001411841        2.556300e-02                0
##                    CD4.recently.activated CD4.transitional.memory CD8.cytotoxic
## AAACAAGTATCTCCCA-1                      0              0.28183238             0
## AAACACCAATAACTGC-1                      0              0.27723642             0
## AAACAGAGCGACTCCT-1                      0              0.50784627             0
## AAACAGGGTCTATATT-1                      0              0.09591864             0
## AAACAGTGTTCCTGGG-1                      0              0.17426676             0
## AAACATTTCCCGGATT-1                      0              0.48668046             0
##                    CD8.effector.memory CD8.pre.exhausted
## AAACAAGTATCTCCCA-1                   0        0.03728053
## AAACACCAATAACTGC-1                   0        0.07940511
## AAACAGAGCGACTCCT-1                   0        0.13425242
## AAACAGGGTCTATATT-1                   0        0.05342847
## AAACAGTGTTCCTGGG-1                   0        0.08483890
## AAACATTTCCCGGATT-1                   0        0.07119686
##                    CD8.terminally.exhausted        cDC Macrophages.SPP1 mDC
## AAACAAGTATCTCCCA-1               0.17089407 0.04239509                0   0
## AAACACCAATAACTGC-1               0.00000000 0.01880710                0   0
## AAACAGAGCGACTCCT-1               0.00000000 0.02648264                0   0
## AAACAGGGTCTATATT-1               0.09249216 0.06192265                0   0
## AAACAGTGTTCCTGGG-1               0.00000000 0.03689448                0   0
## AAACATTTCCCGGATT-1               0.00000000 0.02745349                0   0
##                    Monocytes          NK        pDC Plasma.B.cells
## AAACAAGTATCTCCCA-1         0 0.153720000 0.02090671     0.04565245
## AAACACCAATAACTGC-1         0 0.000000000 0.07860440     0.05336292
## AAACAGAGCGACTCCT-1         0 0.000000000 0.02741362     0.01610836
## AAACAGGGTCTATATT-1         0 0.003493694 0.07979980     0.06826405
## AAACAGTGTTCCTGGG-1         0 0.000000000 0.08817366     0.06997692
## AAACATTTCCCGGATT-1         0 0.000000000 0.07421472     0.06674091
##                    T.cells.naive T.cells.regulatory T.helper.cells  TAMs.C1QC
## AAACAAGTATCTCCCA-1             0          0.0000000    0.000000000 0.13375576
## AAACACCAATAACTGC-1             0          0.0000000    0.233590096 0.06219759
## AAACAGAGCGACTCCT-1             0          0.0000000    0.008634505 0.04868357
## AAACAGGGTCTATATT-1             0          0.0000000    0.000000000 0.05379755
## AAACAGTGTTCCTGGG-1             0          0.1532738    0.139836908 0.11184478
## AAACATTTCCCGGATT-1             0          0.0000000    0.157483558 0.07646224
##                    TAMs.proinflamatory    res_ss
## AAACAAGTATCTCCCA-1          0.11356301 0.4351546
## AAACACCAATAACTGC-1          0.05454352 0.5368353
## AAACAGAGCGACTCCT-1          0.06330221 0.5212597
## AAACAGGGTCTATATT-1          0.02873041 0.4487828
## AAACAGTGTTCCTGGG-1          0.09975192 0.4937794
## AAACATTTCCCGGATT-1          0.01279293 0.5570431

Look at how specific the topic profiles are for each cell type.

h <- NMF::coef(nmf_mod[[1]])
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- SPOTlight::dot_plot_profiles_fun(
  h = h,
  train_cell_clust = nmf_mod[[2]])

topic_profile_plts[[2]]

Let’s now visualize how the deconvoluted spots on the the Seurat object.

# keep on
decon_mtrx <- decon_mtrx_ls[[2]]
decon_mtrx <- decon_mtrx[, colnames(decon_mtrx) != "res_ss"]
decon_mtrx <- decon_mtrx[, !is.na(colnames(decon_mtrx))]
se_obj@meta.data <- cbind(se_obj@meta.data, decon_mtrx)
saveRDS(se_obj, "R_obj/breast_slide_deconvoluted.rds")

We have an object with the deconvolution information already added:

se_obj <- readRDS("R_obj/breast_slide_deconvoluted.rds")
ct <- colnames(decon_mtrx)

scatterpie_plot(
    se_obj = se_obj,
    cell_types_all = ct,
    pie_scale = 0.3
    ) +
  coord_fixed(ratio = 1) +
  scale_fill_manual(values = cell_type_palette)
## [1] "Using slice slice1"

Look at the spatial scatterpie by looking at cell types which are not present throughout the entire tissue

# keep only cell types that are present in less than 80% of the spots
keep_0.8 <- colSums(se_obj@meta.data[, ct] > 0) < 0.8 * ncol(se_obj)
# but not those that were not found on the tissue
keep_g0 <- colSums(se_obj@meta.data[, ct] > 0) > 0

# select cell types fullfiling the conditions
ct_var <- colnames(se_obj@meta.data[, ct])[keep_0.8 & keep_g0]

ct_var
## [1] "B.cells"                  "CD8.terminally.exhausted"
## [3] "Monocytes"                "NK"                      
## [5] "T.cells.regulatory"       "T.helper.cells"
scatterpie_plot(
  se_obj = se_obj,
  cell_types_all = ct_var,
  pie_scale = 0.3
  ) +
  coord_fixed(ratio = 1) +
  scale_fill_manual(values = cell_type_palette)
## [1] "Using slice slice1"

Lets now see the frequencies of selected cell types across the regions in the tissue. We will focus on cytotoxic T cells and macrophages as they can give an idea of how infiltrated the tumor can be.

VlnPlot(
  se_obj,
  features = c("CD8.terminally.exhausted", "TAMs.proinflamatory"),
  group.by = "seurat_clusters"
)

plot_1 <- SpatialPlot(
  se_obj, 
  pt.size = 0
  ) + 
  NoLegend()

plot_2 <- SpatialDimPlot(
  se_obj, 
  group.by = "seurat_clusters"
)

plot_1 + plot_2

We can see that in region of cluster 2 we have a presence of both exhausted T cells and proinflamatory macrophages.


Back to top

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
## [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
## [5] LC_TIME=Spanish_Spain.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] imager_0.42.11      magrittr_2.0.1      cowplot_1.1.1      
##  [4] NMF_0.23.0          Biobase_2.53.0      BiocGenerics_0.40.0
##  [7] cluster_2.1.2       rngtools_1.5.2      pkgmaker_0.32.2    
## [10] registry_0.5-1      SPOTlight_0.1.7     patchwork_1.1.1    
## [13] SeuratObject_4.0.4  Seurat_4.0.6        forcats_0.5.1      
## [16] stringr_1.4.0       dplyr_1.0.7         purrr_0.3.4        
## [19] readr_2.1.1         tidyr_1.1.4         tibble_3.1.6       
## [22] ggplot2_3.3.5       tidyverse_1.3.1    
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2            reticulate_1.22       tidyselect_1.1.1     
##   [4] htmlwidgets_1.5.4     grid_4.1.0            Rtsne_0.15           
##   [7] scatterpie_0.1.7      munsell_0.5.0         codetools_0.2-18     
##  [10] ica_1.0-2             future_1.23.0         miniUI_0.1.1.1       
##  [13] withr_2.4.3           colorspace_2.0-2      highr_0.9            
##  [16] knitr_1.37            rstudioapi_0.13       ROCR_1.0-11          
##  [19] tensor_1.5            listenv_0.8.0         labeling_0.4.2       
##  [22] polyclip_1.10-0       bit64_4.0.5           farver_2.1.0         
##  [25] parallelly_1.30.0     vctrs_0.3.8           generics_0.1.1       
##  [28] xfun_0.29             R6_2.5.1              doParallel_1.0.16    
##  [31] ggbeeswarm_0.6.0      pals_1.7              hdf5r_1.3.5          
##  [34] spatstat.utils_2.3-0  assertthat_0.2.1      promises_1.2.0.1     
##  [37] scales_1.1.1          beeswarm_0.4.0        gtable_0.3.0         
##  [40] Cairo_1.5-12.2        globals_0.14.0        bmp_0.3              
##  [43] goftest_1.2-3         rlang_0.4.12          splines_4.1.0        
##  [46] lazyeval_0.2.2        dichromat_2.0-0       spatstat.geom_2.3-1  
##  [49] broom_0.7.10          yaml_2.2.1            reshape2_1.4.4       
##  [52] abind_1.4-5           modelr_0.1.8          backports_1.4.1      
##  [55] httpuv_1.6.4          tools_4.1.0           gridBase_0.4-7       
##  [58] ellipsis_0.3.2        spatstat.core_2.3-2   jquerylib_0.1.4      
##  [61] RColorBrewer_1.1-2    ggridges_0.5.3        Rcpp_1.0.7           
##  [64] plyr_1.8.6            rpart_4.1-15          deldir_1.0-6         
##  [67] pbapply_1.5-0         zoo_1.8-9             haven_2.4.3          
##  [70] ggrepel_0.9.1         fs_1.5.2              data.table_1.14.2    
##  [73] RSpectra_0.16-0       scattermore_0.7       readbitmap_0.1.5     
##  [76] lmtest_0.9-39         reprex_2.0.1          RANN_2.6.1           
##  [79] fitdistrplus_1.1-6    matrixStats_0.61.0    hms_1.1.1            
##  [82] mime_0.12             evaluate_0.14         xtable_1.8-4         
##  [85] jpeg_0.1-9            readxl_1.3.1          gridExtra_2.3        
##  [88] compiler_4.1.0        maps_3.4.0            KernSmooth_2.23-20   
##  [91] crayon_1.4.2          htmltools_0.5.2       ggfun_0.0.4          
##  [94] mgcv_1.8-38           later_1.3.0           tzdb_0.2.0           
##  [97] tiff_0.1-10           lubridate_1.8.0       DBI_1.1.2            
## [100] tweenr_1.0.2          dbplyr_2.1.1          MASS_7.3-54          
## [103] Matrix_1.4-0          cli_3.1.0             parallel_4.1.0       
## [106] igraph_1.2.10         pkgconfig_2.0.3       plotly_4.10.0        
## [109] spatstat.sparse_2.1-0 xml2_1.3.3            foreach_1.5.1        
## [112] vipor_0.4.5           bslib_0.3.1           rvest_1.0.2          
## [115] digest_0.6.29         sctransform_0.3.2     RcppAnnoy_0.0.19     
## [118] spatstat.data_2.1-2   rmarkdown_2.11        cellranger_1.1.0     
## [121] leiden_0.3.9          uwot_0.1.11           shiny_1.7.1          
## [124] lifecycle_1.0.1       nlme_3.1-153          jsonlite_1.7.2       
## [127] mapproj_1.2.7         viridisLite_0.4.0     fansi_0.5.0          
## [130] pillar_1.6.4          lattice_0.20-45       ggrastr_1.0.1        
## [133] fastmap_1.1.0         httr_1.4.2            survival_3.2-13      
## [136] glue_1.6.0            png_0.1-7             iterators_1.0.13     
## [139] bit_4.0.4             ggforce_0.3.3         stringi_1.7.6        
## [142] sass_0.4.0            irlba_2.3.5           future.apply_1.8.1